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Researchers are using a sensor-input-based metric to deveiop a team of robots that would 
havethecapabiiityto learn their roles and improve strategies so that they can meet their 
overali goaisin dynamic unstructured environments such as underwater or urban settings 
in which communications and monitoring are difficuit. 


F ff a robot to operate autonomously in a dynamic 
environment, it must becapableof adapting itself 
iithout the help of humans. The ultimate goal 
of our research isto provideteams of unmanned 
underwater vehicles (UU Vs) some of the abilities 
of animals to adapt to their environment using their 
memories, without requiring exhaustivetrial-and-error 
testing or complex modeling of the environment. 

We focus on U U V s because they offer the promise of 
making dangerous tasks such as searching for under¬ 
water hazardsor surveying the ocean bottom moresafe 
and economical for government and commercial oper¬ 
ations. We adopt a team concept to reduce overall mis¬ 
sion cost using several low-cost subordinate UUVsto 
augment the sensor capabilities of a higher-capability 
lead UUV. Our goal isto develop a team of robots that 
would have the capability to learn their roles and 
Improve team strategies so that the team can meet Its 
overall goals in dynamic unstructured environments 
such as underwater or urban settings in which commu¬ 
nications and monitoring aredifficult. 

Our research uses a sensor-input-based metric for suc¬ 
cess combined with a training regimen based on recently 
collected memories—a temporal series of sensor/action 
relationships—in which robots with "ears" listen for a 
leader robot and attempt to follow,^ and wheretheensu¬ 
ing formations are a result of emergent behavior.^ For 
this application, the sensor input is the sound intensity 
in the left and right ear, and the action isto turn left, go 


straight, or turn right, keeping the intensity within a cer¬ 
tain range in both ears. 

UNDERWATER NAVIGATION 

One of the greatest challenges for underwater 
autonomous operations is navigation. Currently, most pre¬ 
cision underwater navigation relieson somesort of exter¬ 
nal infrastructure such as surface ships or underwater 
beacons placed In known positions. Subsea navigation 
uses these assets as reference points. H owever, this limits 
theoperation of UUVsto fairly small areas, and somesit- 
uationsrequlreassessingan area'senvironmental or com¬ 
mercial attributes before an infrastructure exists. To 
accomplish such tasks, theUUV team mustbeableto nav¬ 
igate to an area, carry out its task, and return, requiring 
expensive and complex navigation systems. Seawater's 
varying physical properties, along with acoustic issues 
such as spreading, reverberation, and multipath, make 
autonomous, nonifrastructure-based underwater naviga¬ 
tion a difficult task.Any mission involving multipleUUV's 
relieson their capability to navigate as a team. A typical 
mission will have many distinct phases, requiring the 
smooth transition between formations. Initially, theUUVs 
will be onboard a host vessel. After theUUVs have been 
sea- and mission-prepped, they will be put into the water 
and will form into a group to travel to the area of inter¬ 
est. The current assumption is that at least one vehicle will 
have an advanced positioning system on board and that 
theotherswill navigate relative to thisvehicle. 
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Upon arriving at the area of interest, the UUVs will 
change into task-specific formations and execute their 
mission-related goals. When the mission is completed, 
the UUVs will journey back to their host vessels, where 
the collected data will be processed and disseminated. 

Figure 1 shows a team of five UUVs moving in a V- 
formation. The red UUV is the leader, while the green 
UUVs are followers. In this Illustration, the follower 
UUVs are assisting the leader by extending its sensor 
footprint, indicated by the yellow ovals; overall costs 
are reduced because the green UUVs do not need the 
same capabilities (long-range communications and nav¬ 
igation) as the red UUV. 

The leader/follower concept uses relative navigation 
between vehicles to provide an attractive and effective 
solution—a solutoin that is used abundantly in nature 
and in human activities involving multiple vehicles. 
H owever, because of the undersea environment's con¬ 
stantly changing properties—currents, density, bottom 
composition, biofouling, and so forth—and different 
mission-specific payloads, the vehicles must be able to 
adjust their control strategies. 

0 ur work recognizes these factors and is built upon 
earlier work that simulated formations of neural-net- 
work-controlled vehicles and then was extended to 
wheeled mobile robots using acoustic sensor systems.^ 
We focus on memory-based learning algorithms 
designed to reduce the number of trial solutions (envi¬ 
ronmental exploration) the robotic system requires.The 
anticipated benefits include reduction of time for setup 
and calibration of sensor systems and, in the context of 
a feedback-based robot architecture, quicker adaptation 
to changing environments. 

Our work is distinctive because it uses machine-learn¬ 
ing techniques to learn the control laws to move the 
UUVs Into (acquire) formation, and maintain (follow) it. 
The system uses machine-learning techniques to learn 
the low-level sensor/action relationships, while it uses 
emergent behavior to construct theformations.^ Our 
research uses data acquisition methodsto generate con¬ 
trollers without a priori knowledgeor physical repetition 
of candidate solutions. 

MEMORY-BASED LEARNING 

If robots can learn from recent memory, researchers 
can avoid directed testing of trial solutions. By record¬ 
ing sensor data/action pairs and actions that optimize 
goals, a robot can create a "draft" controller that 
researchers can iteratively improve as the robot oper¬ 
ates in its environment. 

Our random-but-purposeful controller uses sensor 
feedback to continue actions that move it closer to its 
goal. As long as the feedback indicates that the robot is 
meeting its goal, it continues what it is doing; if not, it 
randomly chooses another action. Sincetheselection of 
the new action is random, examples of correct actions 



Figure 1. UUV team. In this team of five UUVs, the red UUV is Hie 
team leader. In this leader/followerteam concept, the red 
leader UUV contains a high-accuracy navigation sy^em, while 
thegreen followerUUVs navigate r^adveto the leader by 
acoustically ^sing its position. 

are distributed relatively evenly. In thecaseof a follower 
robot exploring its environment, the left, right, and 
straight examples will bedlstrlbuted relatively evenly. 

Figure 2 shows a graphic example of the robot trajec¬ 
tory using the random-but-purposeful training controller. 
In some tests, good examples consisted predominately 
of either left or right turns; as a result, the generated 
neural-network controller did not learn how to turn in 
both directions. To contend with this deficiency, weused 
a mirroring technique that generated "extrapolated" 
memories where a left turn closer to the source gener¬ 
ated an identical right turn to complete the training set. 
These training sets were then passed to a simple 
genetic-algorithmtrainingpro cess th at gen erates a feed - 



Figure 2. Random-but-purposeful controller. In this example 
path, the sound source isatthe center of the plot and thesmi- 
ley fyces with antenna denote the robot and its orientation as 
it follows the path. 
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Figure 3. Training set. The soiid part of the iine fits in the category of ris¬ 
ing parts, theda^ed part in thedropping category, the dotted part in 
remaining stabie category, and the tiiin part near the middie of the time 
series is the fiuctuating category. 


forward, neural-network controller. Learning from data 
collected directly from theoperation environment lessens 
calibration time for sensors and equipment. 

REACTIVE CONTROLLER 

For a reactive controller, the actions at time tare based 
solely upon thecurrent sensor readings. Consequently, the 
reactive-learning algorithm uses a training set comprised 
strictly of individual sensor/action pairs that are marked 
good if the action results in an increased sensor value or 
bad for a decreased value. The controller examines the 
recent memory generated by the random-but-purposeful 
exploration algorithm for occurrences of these examples. 

I n this process, the positive examples are used to form 
a set of training examples used by a genetic algorithm to 
train a neural network. An intensity filter ensures that 
actions have a direct effect on sensor results. Once this 
cause-and-effect relationship is established, the system 
sorts the examples into positive and negative sets. The 
work presented here uses only the positive examples; 
however, simulations have demonstrated that negative 
examples can also be used by rewarding a neural net¬ 
work if it takes a different action than that associated 
with the example. 

Oneknown drawback of thistypeof controller isthat 
it cannot react to trends that occur over time. A classic 
examplefor theformation maneuvering task isthefor- 
ward-backward ambiguity: If theamplitudein both ears 
remains the same, a following robot with a reactive con¬ 
troller cannot discern whether it is heading directly away 
from or toward the leader; without an additional 
observer to detect this trend, a robot could wander away 
from its leader in the wrong direction. 

TREND BASED CONTROLLER 

In the trend-based controller, the number of past 
inputsdictatesthespan of time over which thecontroller 


will be able to observe trends and also its abil¬ 
ity to react quickly to sudden changes. The 
motivation for this type of controller was to 
ensure that it could properly handle the for¬ 
ward-backward ambiguity. 

As Figures shows, to create a training set for 
this controller, we split the recent memory into 
four categoriesof sensor-intensity trends: rising 
(R), dropping (D), stable(S), and fluctuating (F). 
The training sets are formed from the trend cat¬ 
egories. For the rising and stable groups, the 
action taken during each particular run is 
assumed to be correct, sotheneural networks in 
training are rewarded when they take this same 
action. Flowever, when the goal parameter is 
dropping, the correct action is defined as the 
action in an adjacent or overlapping rising run 
because it is that action that alters the negative 
energy gain. If there is no adjacent or overlap¬ 
ping rising run, that particular dropping run is not used 
in the collection of training examples. The system cur¬ 
rently does not use fluctuating runs to create training 
examples because there is no programmatic method of 
determining what the correct action is for this state. 

EXPERIMENTAL RESULTS 

Simulations have tested lineformationswith up to 30 
robots. We also have used up to eight robots to test 
binary tree formations using attraction and repulusion. 
To physically simulate UUVs in the underwater envi¬ 
ronment, we used ActivM edia land robots equipped 
with audio transmission and "two-ear" listening sys¬ 
tems operating semi passively, meaning that the robots 
do not exchange position or bearing and range infor¬ 
mation. Instead, each robot listensfor a chirp emitted by 
its leader and steers itself toward the sound by turning 
in thedirection of the strongest signal (left or right). The 
system uses a frequency multiplexed communication 
scheme in which each leader transmits in its own pre¬ 
specified frequency band and followers are assigned to 
a leader by listening in thespecified band. Followers use 
the signaling to determine a relative direction to steer 
toward the leader. 

In quantitive testing using a computer simulation,^ the 
reactive controller performed better but could not solve 
the forward/backward ambiguity, whereas the trend- 
based controller could under some conditions. 

Figure4 shows successful formation-maneuvering tests 
with a line of three robots using reactive feed-forward 
neural networks. Significantly, the individual robots are 
not aware of the concept formations; instead, they inde¬ 
pendently follow their assigned leaders based on the rel¬ 
ative strengths of thechirps received at their microphones. 
Thefollower robots avoid colliding with their leaders by 
slowing down when sound intensity goes above a thresh¬ 
old, indicating that it is very close to the sound source. 
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Theformationsaretheresult of emergent behavior, a 
global form of behavior that results solely from local, 
or bottom-up, activity. The advantage to using tech¬ 
niques based on these ideas is that they do not require 
a central controller, thus saving communications band¬ 
width, increasing robustness, and reducing internal sys¬ 
tem complexity. 


E ploying two learning algorithms helps to contend 
ith the challenges presented by dynamic unstruc- 
red environments such as those found under water 
or in an urban environment where central control and 
planning is difficult for classic sense/plan/act systems. 
Learning what was done correctly In short exploration 
periods keeps the amount of required a priori knowledge 
to a minimum. The formations described in these tests 
result from emergent behavior in that no single robot is 
programmed with the concept of a line or other forma¬ 
tion. Emergent behavior has been used to explain forma¬ 
tions of birdsand fish^ and isa valuabletool for creating 
complex interactions among many individual enti- 
tieswithout relying on a centralized control scheme. 
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